
330 Itertools for Combinatorics – Permutations and Combinations
from Spurious Correlations,
http://www.tylervigen.com
, to show how this will work.
We’ll pick three datasets with the same time range, datasets numbered 7, 43, and 3,890.
We’ll simply catenate the data into a grid. Because the source data repeats the
year
column,
we’ll start with data that includes the repeated
year
column. We’ll eventually remove the
obvious redundancy, but it’s often best to start with all of the data present as a way to
conrm that the various sources of data align with each other properly.
This is how the rst and the remaining rows of the yearly data will look:
[('year','Per capita consumption of cheese (US)Pounds (USDA)',
'Number of people who died by becoming tangled in their
bedsheets Deaths (US) (CDC)',
'year','Per capita consumption of mozzarella cheese (US)Pounds
(USDA)','Civil engineering doctorates awarded (US) Degrees awarded
(National Science Foundation)',
'year','US crude oil imports from Venezuela Millions of barrels
(Dept. of Energy)','Per capita consumption of high fructose corn
syrup (US) Pounds (USDA)'),
(2000, 29.8, 327, 2000, 9.3, 480, 2000, 446, 62.6),
(2001, 30.1, 456, 2001, 9.7, 501, 2001, 471, 62.5),
(2002, 30.5, 509, 2002, 9.7, 540, 2002, 438, 62.8),
(2003, 30.6, 497, 2003, 9.7, 552, 2003, 436, 60.9),
(2004, 31.3, 596, 2004, 9.9, 547, 2004, 473, 59.8),
(2005, 31.7, 573, 2005, 10.2, 622, 2005, 449, 59.1),
(2006, 32.6, 661, 2006, 10.5, 655, 2006, 416, 58.2),
(2007, 33.1, 741, 2007, 11, 701, 2007, 420, 56.1),
(2008, 32.7, 809, 2008, 10.6, 712, 2008, 381, 53),
(2009, 32.8, 717, 2009, 10.6, 708, 2009, 352, 50.1)]
This is how we can use the
combinations()
function to yield all the combinations of the
nine variables in this dataset, taken two at a time:
>>> combinations(range(9), 2)
There are 36 possible combinations. We’ll have to reject the combinations that involve